A Japanese Chess Commentary Corpus

نویسندگان

  • Shinsuke Mori
  • John Richardson
  • Atsushi Ushiku
  • Tetsuro Sasada
  • Hirotaka Kameko
  • Yoshimasa Tsuruoka
چکیده

In recent years there has been a surge of interest in the natural language prosessing related to the real world, such as symbol grounding, language generation, and nonlinguistic data search by natural language queries. In order to concentrate on language ambiguities, we propose to use a well-defined “real world,” that is game states. We built a corpus consisting of pairs of sentences and a game state. The game we focus on is shogi (Japanese chess). We collected 742,286 commentary sentences in Japanese. They are spontaneously generated contrary to natural language annotations in many image datasets provided by human workers on Amazon Mechanical Turk. We defined domain specific named entities and we segmented 2,508 sentences into words manually and annotated each word with a named entity tag. We describe a detailed definition of named entities and show some statistics of our game commentary corpus. We also show the results of the experiments of word segmentation and named entity recognition. The accuracies are as high as those on general domain texts indicating that we are ready to tackle various new problems related to the real world.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Positional Features for Annotating Chess Games: A Case Study

By developing an intelligent computer system that will provide commentary of chess moves in a comprehensible, user-friendly and instructive way, we are trying to use the power demonstrated by the current chess engines for tutoring chess and for annotating chess games. In this paper, we point out certain differences between the computer programs which are specialized for playing chess and our pr...

متن کامل

Auditory memory function in expert chess players

Background: Chess is a game that involves many aspects of high level cognition such as memory, attention, focus and problem solving. Long term practice of chess can improve cognition performances and behavioral skills. Auditory memory, as a kind of memory, can be influenced by strengthening processes following long term chess playing like other behavioral skills because of common processing pat...

متن کامل

Domain Specific Named Entity Recognition Referring to the Real World by Deep Neural Networks

In this paper, we propose a method for referring to the real world to improve named entity recognition (NER) specialized for a domain. Our method adds a stacked autoencoder to a text-based deep neural network for NER. We first train the stacked auto-encoder only from the real world information, then the entire deep neural network from sentences annotated with NEs and accompanied by real world i...

متن کامل

Picking the Amateur's Mind - Predicting Chess Player Strength from Game Annotations

Results from psychology show a connection between a speaker’s expertise in a task and the language he uses to talk about it. In this paper, we present an empirical study on using linguistic evidence to predict the expertise of a speaker in a task: playing chess. Instructional chess literature claims that the mindsets of amateur and expert players differ fundamentally (Silman, 1999); psychologic...

متن کامل

Can Symbol Grounding Improve Low-Level NLP? Word Segmentation as a Case Study

We propose a novel framework for improving a word segmenter using information acquired from symbol grounding. We generate a term dictionary in three steps: generating a pseudo-stochastically segmented corpus, building a symbol grounding model to enumerate word candidates, and filtering them according to the grounding scores. We applied our method to game records of Japanese chess with commentar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016